Identifying collocations using cross-lingual association measures

نویسندگان

  • Lis Pereira
  • Elga Strafella
  • Kevin Duh
  • Yuji Matsumoto
چکیده

We introduce a simple and effective crosslingual approach to identifying collocations. This approach is based on the observation that true collocations, which cannot be translated word for word, will exhibit very different association scores before and after literal translation. Our experiments in Japanese demonstrate that our cross-lingual association measure can successfully exploit the combination of bilingual dictionary and large monolingual corpora, outperforming monolingual association measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Corpus-based Analysis of Collocational Errors in the Iranian EFL Learners' Oral Production

Collocations are one of the areas generally considered problematic for EFL learners. Iranian learners of English like other EFL learners face various problems in producing oral collocations.  An analysis of learners' spoken interlanguage both indicates the scope of the problem and the necessity to spend more time and energy by learners on mastering collocations. The present study specifically f...

متن کامل

Retrieving Bilingual Verb-Noun Collocations by Integrating Cross-Language Category Hierarchies

This paper presents a method of retrieving bilingual collocations of a verb and its objective noun from cross-lingual documents with similar contents. Relevant documents are obtained by integrating crosslanguage hierarchies. The results showed a 15.1% improvement over the baseline nonhierarchy model, and a 6.0% improvement over use of relevant documents retrieved from a single hierarchy. Moreov...

متن کامل

Can we do better than frequency? A case study on extracting PP-verb collocations

We argue that lexical association measures (AMs) should be evaluated against a reference set of collocations manually extracted from the full candidate data, and that the notion of collocation needs to be precisely defined so that human collocativity judgments and experimental results are reproducible. We show that identification results achieved by particular AMs do not crucially depend on tex...

متن کامل

Empirical Implications on Lexical

An empirical study is presented showing how factors such as co-occurrence frequency, linguistic constraints in the candidate data and type of collocation to be identiied innuence the identiication accuracy achieved, on the one hand, by a mere frequency-based approach and, on the other hand, by well known statistical association measures such as mutual information, Dice coeecient, relative entro...

متن کامل

Conditional Random Fields for Spanish Named Entity Recognition Using Unsupervised Features

Unsupervised features based on word representations such as word embeddings and word collocations have shown to significantly improve supervised NER for English. In this work we investigate whether such unsupervised features can also boost supervised NER in Spanish. To do so, we use word representations and collocations as additional features in a linear chain Conditional Random Field (CRF) cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014